Limma mixed models feature #6753

KamilMaliszArdigen · 2024-10-08T11:03:04Z

This extends limma functionality with option to process rna-seq data in pairwise and mixed mode. It will be used in differentialabundance pipeline and in future can be used in other pipelines which are using limma.

PR checklist

Closes #XXX

maxulysse · 2024-10-08T11:44:34Z

modules/nf-core/limma/differential/main.nf

@@ -4,17 +4,19 @@ process LIMMA_DIFFERENTIAL {

    conda "${moduleDir}/environment.yml"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
-        'https://depot.galaxyproject.org/singularity/bioconductor-limma:3.54.0--r42hc0cfd56_0' :
-        'biocontainers/bioconductor-limma:3.54.0--r42hc0cfd56_0' }"
+        'oras://community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:7fc48564d286c1c6' :


we shouldn't have any oras protocol in the modules, it doesn't work with NXF_SINGULARITY_CACHEDIR
Can you swittch to https?

modules/nf-core/limma/differential/environment.yml

maxulysse · 2024-10-08T11:48:58Z

modules/nf-core/limma/differential/main.nf

-        'https://depot.galaxyproject.org/singularity/bioconductor-limma:3.54.0--r42hc0cfd56_0' :
-        'biocontainers/bioconductor-limma:3.54.0--r42hc0cfd56_0' }"
+        'oras://community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:7fc48564d286c1c6' :
+        'community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:edea0f9fbaeba3a0' }"


Suggested change

'community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:edea0f9fbaeba3a0' }"

'nf-core/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:edea0f9fbaeba3a0' }"

I pulled and pushed to quay.io

So we can change the registry from all container in a simpler way.
We'll be using community.wave.seqera.io as registry when we do the switch

I'm not keen on this @maxulysse, there's no way for someone without privilege to do this and maintain the module.

pinin4fjords

There are a few changes I'd like to see here.

I understand the attraction of readr etc, but until we have full Seqera Containers integration with nf-core tooling (which shouldn't be long), extra packages create challenges.

The forking of logic is also a bit ugly. If the logic is sufficiently different to require two scripts it should be two modules- though I'd favour integration if it didn't create excess complexity in the R code.

pinin4fjords · 2024-10-08T11:45:41Z

modules/nf-core/limma/differential/environment.yml

 dependencies:
-  - bioconda::bioconductor-limma=3.54.0
+- bioconda::bioconductor-edger=4.0.16


This multi-package thing still creates difficulties right now- which is why I wrote this module depending on a single Biocontainer.

pinin4fjords · 2024-10-08T11:46:01Z

modules/nf-core/limma/differential/main.nf

@@ -4,17 +4,19 @@ process LIMMA_DIFFERENTIAL {

    conda "${moduleDir}/environment.yml"
    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
-        'https://depot.galaxyproject.org/singularity/bioconductor-limma:3.54.0--r42hc0cfd56_0' :
-        'biocontainers/bioconductor-limma:3.54.0--r42hc0cfd56_0' }"
+        'oras://community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:7fc48564d286c1c6' :


These oras links don't work well with nf-core tooling right now, unfortunately

pinin4fjords · 2024-10-08T11:56:59Z

modules/nf-core/limma/differential/main.nf

@@ -23,5 +25,10 @@ process LIMMA_DIFFERENTIAL {
    task.ext.when == null || task.ext.when

    script:
-    template 'limma_de.R'
+    if (type == 'rnaseq') {
+        template 'limma_de_rnaseq.R'


I'm not massively keen on the parallel script.

Either the new thing should be a separate module, or they should be properly integrated.

How you envision "proper" integration? As single template? I will refactor that if needed.

I mean, if the two scripts share enough logic, they should be one with a simple conditional. If the logic is quite divergent such that doing that adds too much complexity, these should be separate modules.

Well I had similar doubts in the one hand this is still limma and differential abundance analysis and on the other site this one is combined with voom and there is lot of differences. So maybe we will create new module named limma/differential-voom? or simply limma/voom - what are your thoughts on the module naming?

I think this should be limma/differential-voom.

As in the comments below, I also think this should be, as much as possible, a thin wrapper around Limma functions only. Otherwise it's just your custom script, rather than a 'standard' Limma module.

updating this older comment after newer ones below: I don't believe additional logic required for Voom merits its own module. We can add the Voom part in a conditional, and other changes (e.g. duplicateCorrelation) etc, apply equally to non-Voom Limma.

pinin4fjords · 2024-10-10T09:01:32Z

modules/nf-core/limma/differential/environment.yml

-  - bioconda::bioconductor-limma=3.54.0
+- bioconda::bioconductor-edger=4.0.16
+- bioconda::bioconductor-ihw=1.28.0
+- bioconda::bioconductor-limma=3.58.1


Limma is a dependency of edger, you don't need to include it here

That is true however without this we won't be able to control limma version which will be used

We only pin the primary dependency in the modules repo, unless you have a really compelling reason to do otherwise. We will at some point have lock files that pin all dependencies.

(as in other comments, I'm not sure we need edgeR here, so we'd only pin Limma)

Well we noticed that results are slightly different depending on limma versions so we decided to pin it to achieve reproducibility. I will adopt your comment in standalone module

Actually, I see your point on this one, apologies. Limma is the primary dependency so it's pinned, but we need edgeR.

pinin4fjords · 2024-10-10T09:16:28Z

modules/nf-core/limma/differential/templates/limma_de_rnaseq.R

+dim(DGE)
+
+# Calculate normalization factors
+DGE <- calcNormFactors(DGE)


Please correct me if I'm wrong, but this and the DGEList are the only things that need edgeR here, right? voom() will do the normalisation things, and will just take a matrix. Then we don't need the edgeR dependency.

We use edgeR dependency to read raw matrix count to create DGEList object which is one of required ways to provide expression data to voom: "counts - a numeric matrix containing raw counts, or an ExpressionSet containing raw counts, or a DGEList object." : https://www.rdocumentation.org/packages/limma/versions/3.28.14/topics/voom

So we either need BioBase or edgeR dependency.

~~Yep, as I say, we don't need edgeR, and that will simplify the software dependencies~~

Actually, after reminding myself, I take this back, think it probably does need the DGEList etc the way you're doing.

pinin4fjords · 2024-10-10T09:18:55Z

modules/nf-core/limma/differential/templates/limma_de_rnaseq.R

+## Generate outputs
+# Differential expression table
+
+if (opt\$IHW_correction) {


IMO if we're writing 'limma' modules they should be thin-as-possible wrappers around Limma functions. We shouldn't we embedding lots of additional logic, aside from the boilerplate required to pass inputs to those functions correctly and write the outputs.

Extensive customisations should probably be in local modules in the workflow, and we'd want to discuss to determine if this was part of standard best practice.

pinin4fjords

I've just had some time to review this again, and I think the R part also needs work. There's no need to treat the pairing variable in a special way as far as I can tell, it can be a blocking variable like any other. At that point, the main difference between the two modes is the use of duplicateCorrelation, for which there could just be a separate flag.

Fundamentally, this would integrate much better into the existing differentialabundance logic if started from the existing limma module.

I actually don't think the logic is very different with using Voom, so for basic Voom integration the existing module can just have a conditional in it something like this, about here:

if (!is.null(opt$use_voom) && opt$use_voom) {
    # Create a DGEList object for RNA-seq data
    dge <- DGEList(counts = intensities.table)
    
    # Normalize counts using TMM
    dge <- calcNormFactors(dge, method = "TMM")
    
    # Run voom to transform the data
    voom_result <- voom(dge, design)
    data_for_fit <- voom_result
} else {
    # Use as.matrix for regular microarray analysis
    data_for_fit <- as.matrix(intensities.table)
}

All the other logic can be the same, and then we can add in any features you think are missing from there.

pinin4fjords · 2024-10-12T08:25:00Z

modules/nf-core/limma/differential/templates/limma_de_rnaseq.R

+# Calculate normalization factors
+DGE <- calcNormFactors(DGE)
+
+if (opt\$analysis_type == "pairwise") {


Suggested change

if (opt\$analysis_type == "pairwise") {

if (opt\$analysis_type == "pairwise") {

I don't think you even need this extra mode. Sample pairing can be handled as a blocking variable, no? Even if it was needed, you're duplicating a lot of code between the two modes.

Well we are duplicating some code. Unfortunately the mixed model integration have large amount of small additions which resulted in multiple if's example can be voom which is called twice. I will try to incorporate voom into original script. I'm not yet fulli convinced about mixedmodel mode.

pinin4fjords · 2024-10-14T09:44:26Z

Just to re-state the above, and after refreshing myself on the Voom logic, I actually don't think Voom needs to be its own module. I think this can be integrated into the existing module.

The voom-specific component is a matter of a few lines. All the other changes are not Voom specific.

KamilMaliszArdigen · 2024-10-14T22:12:05Z

Just to re-state the above, and after refreshing myself on the Voom logic, I actually don't think Voom needs to be its own module. I think this can be integrated into the existing module.

The voom-specific component is a matter of a few lines. All the other changes are not Voom specific.

I missed your earlier comment I will take a closer look on this one and figure out how to incorporate mixedmodel in the nice clean way.

At least thanks to new module we have nf-test almost ready :D

KamilMaliszArdigen · 2024-10-15T15:05:59Z

@pinin4fjords I've extended original limma module with voom and IHW correction. This is still a draft as my results are now significantly different and sitll this have to be sorted out. I wanted to ask you if you fill that this is a right direction. Also any suggestion regarding incorporation of mixed models logic will be welcome :)

Thank you in advance.

pinin4fjords

Thank you for the ongoing work- this is definitely going the right way.

But I still think the IHW code doesn't belong in a module wrapping standard Limma functions.

pinin4fjords · 2024-10-15T16:58:00Z

modules/nf-core/limma/differential/templates/limma_de.R

@@ -322,18 +357,54 @@ ebayes_args = c(

 fit2 <- do.call(eBayes, ebayes_args)

-# Run topTable() to generate a results data frame
+if ((!is.null(opt\$use_voom) && opt\$use_voom) && (!is.null(opt\$IHW_correction) && opt\$IHW_correction)) {


I'm sorry, but I don't think the IHW correction belongs in this module.

nf-core modules should be thin wrappers around underlying software. We have some unavoidable boilerplate when we're wrapping library code like this, but we shouldn't be adding to that with our own judgement calls- it makes the code harder to maintain, and harder for users expecting Limma-native functionality to understand.

pinin4fjords · 2024-10-15T17:07:22Z

modules/nf-core/limma/differential/templates/limma_de.R

+    data_for_fit <- voom_result
+
+    # Write the normalized counts matrix to a TSV file
+    normalized_counts <- voom_result\$E


If we're adding export of normalised values (which I think is a good thing) we should do the same for the microarry (e.g. with normalizeBetweenArrays), and do the writing outside this conditional.

KamilMaliszArdigen and others added 2 commits October 8, 2024 12:17

Limma mixed models feature

9795e09

Merge branch 'master' into limma-update

2f217b0

pinin4fjords self-requested a review October 8, 2024 11:41

maxulysse reviewed Oct 8, 2024

View reviewed changes

modules/nf-core/limma/differential/environment.yml Outdated Show resolved Hide resolved

maxulysse reviewed Oct 8, 2024

View reviewed changes

pinin4fjords requested changes Oct 8, 2024

View reviewed changes

Test fix

bc5d128

pinin4fjords reviewed Oct 10, 2024

View reviewed changes

pinin4fjords mentioned this pull request Oct 11, 2024

Add limma for rnaseq nf-core/differentialabundance#286

Open

11 tasks

pinin4fjords requested changes Oct 12, 2024

View reviewed changes

Limma voom moved to new module

96fc5e1

KamilMaliszArdigen added 2 commits October 14, 2024 23:13

Limma voom nf-tests

2dcfb12

Limma voom nf-tests

f557caa

Extension of limma module with voom and IHW correction

2e008b0

pinin4fjords reviewed Oct 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limma mixed models feature #6753

Limma mixed models feature #6753

KamilMaliszArdigen commented Oct 8, 2024 •

edited

Loading

maxulysse Oct 8, 2024 •

edited

Loading

maxulysse Oct 8, 2024

pinin4fjords Oct 8, 2024

pinin4fjords left a comment

pinin4fjords Oct 8, 2024

pinin4fjords Oct 8, 2024

pinin4fjords Oct 8, 2024

KamilMaliszArdigen Oct 8, 2024

pinin4fjords Oct 8, 2024

KamilMaliszArdigen Oct 9, 2024

pinin4fjords Oct 10, 2024

pinin4fjords Oct 14, 2024

pinin4fjords Oct 10, 2024

KamilMaliszArdigen Oct 11, 2024

pinin4fjords Oct 11, 2024

pinin4fjords Oct 11, 2024

KamilMaliszArdigen Oct 11, 2024

pinin4fjords Oct 13, 2024 •

edited

Loading

pinin4fjords Oct 10, 2024

KamilMaliszArdigen Oct 11, 2024 •

edited

Loading

pinin4fjords Oct 12, 2024 •

edited

Loading

pinin4fjords Oct 10, 2024

pinin4fjords Oct 10, 2024

pinin4fjords left a comment •

edited

Loading

pinin4fjords Oct 12, 2024

KamilMaliszArdigen Oct 14, 2024

pinin4fjords commented Oct 14, 2024

KamilMaliszArdigen commented Oct 14, 2024 •

edited

Loading

KamilMaliszArdigen commented Oct 15, 2024

pinin4fjords left a comment

pinin4fjords Oct 15, 2024

pinin4fjords Oct 15, 2024

	'community.wave.seqera.io/library/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:edea0f9fbaeba3a0' }"
	'nf-core/bioconductor-edger_bioconductor-ihw_bioconductor-limma_r-dplyr_r-readr:edea0f9fbaeba3a0' }"

	if (opt\$analysis_type == "pairwise") {
	if (opt\$analysis_type == "pairwise") {

Limma mixed models feature #6753

Are you sure you want to change the base?

Limma mixed models feature #6753

Conversation

KamilMaliszArdigen commented Oct 8, 2024 • edited Loading

PR checklist

maxulysse Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pinin4fjords left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pinin4fjords Oct 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KamilMaliszArdigen Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

pinin4fjords Oct 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pinin4fjords left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pinin4fjords commented Oct 14, 2024

KamilMaliszArdigen commented Oct 14, 2024 • edited Loading

KamilMaliszArdigen commented Oct 15, 2024

pinin4fjords left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KamilMaliszArdigen commented Oct 8, 2024 •

edited

Loading

maxulysse Oct 8, 2024 •

edited

Loading

pinin4fjords Oct 13, 2024 •

edited

Loading

KamilMaliszArdigen Oct 11, 2024 •

edited

Loading

pinin4fjords Oct 12, 2024 •

edited

Loading

pinin4fjords left a comment •

edited

Loading

KamilMaliszArdigen commented Oct 14, 2024 •

edited

Loading